44 research outputs found
Evolutionary nonnegative matrix factorization for data compression
This paper aims at improving non-negative matrix factor- ization (NMF) to facilitate data compression. An evolutionary updat- ing strategy is proposed to solve the NMF problem iteratively based on three sets of updating rules including multiplicative, firefly and sur- vival of the fittest rules. For data compression application, the quality of the factorized matrices can be evaluated by measurements such as spar- sity, orthogonality and factorization error to assess compression quality in terms of storage space consumption, redundancy in data matrix and data approximation accuracy. Thus, the fitness score function that drives the evolving procedure is designed as a composite score that takes into account all these measurements. A hybrid initialization scheme is per- formed to improve the rate of convergence, allowing multiple initial can- didates generated by different types of NMF initialization approaches. Effectiveness of the proposed method is demonstrated using Yale and ORL image datasets
Cluster Exploration using Informative Manifold Projections
Dimensionality reduction (DR) is one of the key tools for the visual
exploration of high-dimensional data and uncovering its cluster structure in
two- or three-dimensional spaces. The vast majority of DR methods in the
literature do not take into account any prior knowledge a practitioner may have
regarding the dataset under consideration. We propose a novel method to
generate informative embeddings which not only factor out the structure
associated with different kinds of prior knowledge but also aim to reveal any
remaining underlying structure. To achieve this, we employ a linear combination
of two objectives: firstly, contrastive PCA that discounts the structure
associated with the prior information, and secondly, kurtosis projection
pursuit which ensures meaningful data separation in the obtained embeddings. We
formulate this task as a manifold optimization problem and validate it
empirically across a variety of datasets considering three distinct types of
prior knowledge. Lastly, we provide an automated framework to perform iterative
visual exploration of high-dimensional data
Quantifying the Informativeness of Similarity Measurements
In this paper, we describe an unsupervised measure for quantifying the 'informativeness' of correlation matrices formed from the pairwise similarities or relationships among data instances. The measure quantifies the heterogeneity of the correlations and is defined as the distance between a correlation matrix and the nearest correlation matrix with constant off-diagonal entries. This non-parametric notion generalizes existing test statistics for equality of correlation coefficients by allowing for alternative distance metrics, such as the Bures and other distances from quantum information theory. For several distance and dissimilarity metrics, we derive closed-form expressions of informativeness, which can be applied as objective functions for machine learning applications. Empirically, we demonstrate that informativeness is a useful criterion for selecting kernel parameters, choosing the dimension for kernel-based nonlinear dimensionality reduction, and identifying structured graphs. We also consider the problem of finding a maximally informative correlation matrix around a target matrix, and explore parameterizing the optimization in terms of the coordinates of the sample or through a lower-dimensional embedding. In the latter case, we find that maximizing the Bures-based informativeness measure, which is maximal for centered rank-1 correlation matrices, is equivalent to minimizing a specific matrix norm, and present an algorithm to solve the minimization problem using the norm's proximal operator. The proposed correlation denoising algorithm consistently improves spectral clustering. Overall, we find informativeness to be a novel and useful criterion for identifying non-trivial correlation structure.
Memory-Aware Attentive Control for Community Question Answering With Knowledge-Based Dual Refinement
Process Mining Algorithm for Online Intrusion Detection System
In this paper, we consider the applications of process mining in intrusion
detection. We propose a novel process mining inspired algorithm to be used to
preprocess data in intrusion detection systems (IDS). The algorithm is designed
to process the network packet data and it works well in online mode for online
intrusion detection. To test our algorithm, we used the CSE-CIC-IDS2018 dataset
which contains several common attacks. The packet data was preprocessed with
this algorithm and then fed into the detectors. We report on the experiments
using the algorithm with different machine learning (ML) models as classifiers
to verify that our algorithm works as expected; we tested the performance on
anomaly detection methods as well and reported on the existing preprocessing
tool CICFlowMeter for the comparison of performance